DATA 698 : Capstone Research Project
Capstone Project on CPI and Employment
DATA 698 : Capstone Research Project
- 1 Overview
- 2 Capstone Project on CPI and Employment
- 3 Data Preparation
- 4 Employment vs CPI - by Industry and Area
- 5 References
1 Overview
We are in totally different era of twenty first century, and it gives us very rare situation where any positive news would help the humankind. We want to use the historical CPI data and find the relation of it with the employment, in hope that we would have some positive news on employment by following the trend of the data in past.
We feel that lower CPI would result in more job opportunity, as the it gives space for more competition in small business across sectors.
2 Capstone Project on CPI and Employment
- The estimates of employment for 1998-2006 are based on the 2002 North American Industry Classification System (NAICS). The estimates for 2007-2010 are based on the 2007 NAICS. The estimates for 2011-2016 are based on the 2012 NAICS. The estimates for 2017 forward are based on the 2017 NAICS.
- Excludes limited partners.
- Under the 2007 NAICS, internet publishing and broadcasting was reclassified to other information services.
- (NA) Not available.
- (NM) Not meaningful.
- Not shown to avoid disclosure of confidential information; estimates are included in higher-level totals.
- Estimate for employment suppressed to cover corresponding estimate for earnings. Estimates for this item are included in the total.
Last updated: September 24, 2019- new statistics for 2018; revised statistics for 2014-2017.
3 Data Preparation
Load the required libraries
#packages <- c("R.rsp","pandocfilters","knitr","plyr","tidyr","dplyr","ggplot2","plotly","sqldf","MASS","reshape2","Amelia","mice","googleVis","stringi","ROCR","scatterplot3d","visdat","gridExtra","sjmisc")
#install.packages (packages, repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#invisible (lapply (packages, library, character.only = T))
#install.packages("devtools", lib="C:/Program Files/R/R-3.6.3/library")
#devtools::install_github("hadley/ggplot2")
library(tidyverse)
library(kableExtra)
library(lubridate)
library(forecast)
library(stringi)
library(dplyr)3.1 Load Employment Datasets
file_path <- "../data/cpi/SAINC/"
EMP_DATA_FILE <- paste0(file_path,"EMP9818.csv")
EMP_DATA <- read_delim(EMP_DATA_FILE, delim = ",", col_names = TRUE, trim_ws= TRUE)
#,skip = 6,col_types = list(col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character()))
# names(SAP_INV_DAT) = c("DROP","I_BILLDATE","I_ORG","I_DC","I_DOCCA","I_BILLTYPE","I_ORDER_REA","I_DIV","I_MATYPE","I_BILL_QTY","BLANK")
# SAP_INV_DAT<- SAP_INV_DAT[-which(is.na(SAP_INV_DAT$I_ORG)),] %>% .[-str_which(trimws(.$I_ORG),'SOrg.|-----'),c(-1,-11)]
# 3
# SAP_INV_DAT$I_BILLDATE <- date(parse_datetime(SAP_INV_DAT$I_BILLDATE, "%m/%d/%Y"))
#
# head(SAP_INV_DAT)
summary(EMP_DATA)## GeoFIPS GeoName Region TableName
## Length:7084 Length:7084 Min. :1.000 Length:7084
## Class :character Class :character 1st Qu.:3.000 Class :character
## Mode :character Mode :character Median :5.000 Mode :character
## Mean :4.475
## 3rd Qu.:6.000
## Max. :8.000
## NA's :122
## LineCode IndustryClassification Description Unit
## Min. : 10.0 Length:7084 Length:7084 Length:7084
## 1st Qu.: 517.0 Class :character Class :character Class :character
## Median : 712.5 Mode :character Mode :character Mode :character
## Mean : 872.1
## 3rd Qu.:1103.0
## Max. :2012.0
## NA's :4
## 1998 1999 2000 2001
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2002 2003 2004 2005
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2006 2007 2008 2009
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2010 2011 2012 2013
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2014 2015 2016 2017
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2018
## Length:7084
## Class :character
## Mode :character
##
##
##
##
# Reading the Category and detail Desc for ref.
EMP_DEF_FILE <- paste0(file_path,"SAEMP25N__definition.xml")
IND_DESC <- XML::xmlToDataFrame(EMP_DEF_FILE)
names(IND_DESC) <- c("LineCode","Description")
glimpse(IND_DESC)## Observations: 118
## Variables: 3
## $ LineCode <fct> 10, 20, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 200...
## $ Description <fct> "Total employment (number of jobs)", "Wage and salary e...
## $ NA <fct> "A count of jobs, both full-time and part-time. It incl...
3.2 Show Relevant Employment Data
# MN-MAIN | NN - RANGE | NL - LAST |
EMAP_CAT_LAST <- EMP_DATA[,c(2,5,6,7)] %>%
filter(GeoName=="United States") %>%
mutate(ISM = ifelse(IndustryClassification=="...","M","N")) %>%
separate(IndustryClassification, c("From", "To")) %>%
mutate(ISML = ifelse(is.na(To)==T,"L","N")) %>%
filter(ISML %in% c("L","N")) %>%
select(.,LineCode,Description)
#DT::datatable(EMAP_CAT_LAST)
EMP_DAT_CAT <- EMP_DATA[,c(2,5,9:29)] %>%
left_join(EMAP_CAT_LAST,.,by="LineCode")
glimpse(EMP_DAT_CAT)## Observations: 7,080
## Variables: 24
## $ LineCode <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ Description <chr> "Total employment (number of jobs)", "Total employment ...
## $ GeoName <chr> "United States", "Alabama", "Alaska", "Arizona", "Arkan...
## $ `1998` <chr> "158481200", "2361892", "382166", "2616288", "1445536",...
## $ `1999` <chr> "161531300", "2378217", "381307", "2695892", "1460374",...
## $ `2000` <chr> "165370800", "2392225", "389734", "2801510", "1482449",...
## $ `2001` <chr> "165522200", "2376053", "394565", "2829002", "1482587",...
## $ `2002` <chr> "165095100", "2364828", "402187", "2847095", "1478929",...
## $ `2003` <chr> "165921500", "2371430", "405621", "2917121", "1482035",...
## $ `2004` <chr> "168839700", "2425649", "413864", "3041476", "1505095",...
## $ `2005` <chr> "172338400", "2486833", "421419", "3219820", "1537680",...
## $ `2006` <chr> "175868600", "2545556", "431320", "3375218", "1567682",...
## $ `2007` <chr> "179543700", "2604078", "439825", "3465075", "1582858",...
## $ `2008` <chr> "179213900", "2582591", "443538", "3402808", "1579283",...
## $ `2009` <chr> "173636700", "2479507", "442447", "3228493", "1542944",...
## $ `2010` <chr> "172901700", "2460298", "443904", "3181571", "1541272",...
## $ `2011` <chr> "176091700", "2497933", "450364", "3239045", "1561948",...
## $ `2012` <chr> "178979700", "2503678", "459222", "3295537", "1565142",...
## $ `2013` <chr> "182325100", "2523338", "461110", "3371219", "1569249",...
## $ `2014` <chr> "186233800", "2551872", "461327", "3448173", "1587414",...
## $ `2015` <chr> "190315800", "2586885", "461767", "3548174", "1610779",...
## $ `2016` <chr> "193371900", "2619154", "457371", "3646604", "1629237",...
## $ `2017` <chr> "196825300", "2653968", "456799", "3751283", "1644432",...
## $ `2018` <chr> "200746000", "2691517", "459178", "3859137", "1663188",...
#DT::datatable(EMP_DAT_CAT)
temp_name <- names(EMP_DAT_CAT [,c(4:24)])
EMP_DAT_CAT_SAN <- data.frame(lapply(EMP_DAT_CAT [,c(4:24)],function(x) {
gsub("(D)|(T)",NA,x)
}))
names(EMP_DAT_CAT_SAN) = temp_name
#DT::datatable(EMP_DAT_CAT_SAN)
EMP_DAT_CAT_SAN <- cbind(EMP_DAT_CAT[,c(1:3)],EMP_DAT_CAT_SAN)
#DT::datatable(EMP_DAT_CAT_SAN)3.2.1 Employment from Select Industry by Area
3.2.1.1 NY - New York
EMP_DAT_CAT_SAN_NY <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="New York") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_NY %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.2 MI - Michigan
EMP_DAT_CAT_SAN_MI <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Michigan") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MI %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.3 OR - Oregon
EMP_DAT_CAT_SAN_OR <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Oregon") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_OR %>%
#sample_frac(0.33) %>%
DT::datatable()3.3 Load CPI Data
file_path <- "../data/cpi/archive/"
## Read files named *.csv
filenames <- list.files(path=file_path,
pattern="*.csv",
full.names = TRUE)
#print(filenames)
##Create list of data frame names without the ".csv" part
names <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(filenames) # get filename without the directory file path
)
)
)
#print(names)
###Load all files
for(i in filenames){
## Create list of data frame names without the ".csv" part
name <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(i) # get filename without the directory file path
)
)
)
print(name)
assign(x=name, value=data.frame(read.csv(i)))
#DT::datatable(name)
}## [1] "cu_area"
## [1] "cu_base"
## [1] "cu_data_0_Current"
## [1] "cu_data_1_AllItems"
## [1] "cu_data_10_OtherWest"
## [1] "cu_data_11_USFoodBeverage"
## [1] "cu_data_12_USHousing"
## [1] "cu_data_13_USApparel"
## [1] "cu_data_14_USTransportation"
## [1] "cu_data_15_USMedical"
## [1] "cu_data_16_USRecreation"
## [1] "cu_data_17_USEducationAndCommunication"
## [1] "cu_data_18_USOtherGoodsAndServices"
## [1] "cu_data_19_PopulationSize"
## [1] "cu_data_2_Summaries"
## [1] "cu_data_20_USCommoditiesServicesSpecial"
## [1] "cu_data_3_AsizeNorthEast"
## [1] "cu_data_4_AsizeNorthCentral"
## [1] "cu_data_5_AsizeSouth"
## [1] "cu_data_6_AsizeWest"
## [1] "cu_data_7_OtherNorthEast"
## [1] "cu_data_8_OtherNorthCentral"
## [1] "cu_data_9_OtherSouth"
## [1] "cu_footnote"
## [1] "cu_item"
## [1] "cu_period"
## [1] "cu_periodicity"
## [1] "cu_series"
3.4 Show CPI Reference Data
3.4.1 Area
3.4.2 Base
3.4.3 Item
3.4.4 Periodicity
3.4.5 Series
3.4.6 Period
3.5 Combined Reference Data
3.6 Show CPI Data from Select Industry
3.6.1 CPI - US Food Beverages
FoodItemCode <- cu_reference_data %>%
filter(grepl('SAF|SEF|SS0|SS1|SS20', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((FoodItemCode))
cu_data_USFoodBeverage = cu_data_0_Current %>%
inner_join(FoodItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USFoodBeverage))3.6.2 CPI - US Housing
HousingItemCode <- cu_reference_data %>%
#filter(grepl('SAH|SEH', item_code, fixed = TRUE))
filter(grepl('SAH|SEH', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((HousingItemCode))
cu_data_USHousing = cu_data_0_Current %>%
inner_join(HousingItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
#filter(item_code %like% c('SAH','SEH')) %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USHousing))3.6.3 CPI - US Transportation
TransportItemCode <- cu_reference_data %>%
filter(grepl('SAT|SET|SS4|SS52|SS53|SAS24|SAS4', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((TransportItemCode))
cu_data_USTransportation = cu_data_0_Current %>%
inner_join(TransportItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USTransportation))3.6.4 CPI - US Medical
MedicalItemCode <- cu_reference_data %>%
filter(grepl('SAM|SEM|SS57', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((MedicalItemCode))
cu_data_USMedical = cu_data_0_Current %>%
inner_join(MedicalItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USMedical))3.6.5 CPI - US Education And Communication
EducationItemCode <- cu_reference_data %>%
filter(grepl('SAE|SEE|SSE|SS27', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((EducationItemCode))
cu_data_USEducationAndCommunication = cu_data_0_Current %>%
inner_join(EducationItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USEducationAndCommunication))4 Employment vs CPI - by Industry and Area
4.1 Food & Beverages
4.1.1 NY - New York
cu_emp_data_USFoodBeverage_NY = cu_data_USFoodBeverage %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USFoodBeverage_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Food and beverage stores
#EMP_DAT_CAT_SAN_USFoodBeverage_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode==705)%>%
# mutate(parent_line_code = 705)
cpi_emp_data_USFoodBeverage_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USFoodBeverage_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.1.2 MI - Michigan
cu_emp_data_USFoodBeverage_MI = cu_data_USFoodBeverage %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USFoodBeverage_MI %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Food and beverage stores
#EMP_DAT_CAT_SAN_USFoodBeverage_MI = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode==705)%>%
# mutate(parent_line_code = 705)
cpi_emp_data_USFoodBeverage_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USFoodBeverage_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.1.3 OR - Oregon
cu_emp_data_USFoodBeverage_OR = cu_data_USFoodBeverage %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USFoodBeverage_MI %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Food and beverage stores
#EMP_DAT_CAT_SAN_USFoodBeverage_MI = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode==705)%>%
# mutate(parent_line_code = 705)
cpi_emp_data_USFoodBeverage_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USFoodBeverage_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.2 Housing
4.2.1 NY - New York
cu_emp_data_USHousing_NY = cu_data_USHousing %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USHousing_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Real estate and rental and leasing
#EMP_DAT_CAT_SAN_USHousing_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1100,1101,1102,1103)) %>%
# mutate(parent_line_code = 1100)
cpi_emp_data_USHousing_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USHousing_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.2 MI - Michigan
cu_emp_data_USHousing_MI = cu_data_USHousing %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USHousing_MI %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Real estate and rental and leasing
#EMP_DAT_CAT_SAN_USHousing_MI = EMP_DAT_CAT_SAN_MI %>%
# filter(LineCode %in% c(1100,1101,1102,1103)) %>%
# mutate(parent_line_code = 1100)
cpi_emp_data_USHousing_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USHousing_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.3 OR - Oregon
cu_emp_data_USHousing_OR = cu_data_USHousing %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USHousing_MI %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Real estate and rental and leasing
#EMP_DAT_CAT_SAN_USHousing_MI = EMP_DAT_CAT_SAN_MI %>%
# filter(LineCode %in% c(1100,1101,1102,1103)) %>%
# mutate(parent_line_code = 1100)
cpi_emp_data_USHousing_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USHousing_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.3 Transportation
4.3.1 NY - New York
cu_emp_data_USTransportation_NY = cu_data_USTransportation %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Transportation and warehousing
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(800,801,802,803,804,805,806,807,808,809,811)) %>%
# mutate(parent_line_code = 800)
cpi_emp_data_USTransportation_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USTransportation_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.2 MI - Michigan
cu_emp_data_USTransportation_MI = cu_data_USTransportation %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Transportation and warehousing
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(800,801,802,803,804,805,806,807,808,809,811)) %>%
# mutate(parent_line_code = 800)
cpi_emp_data_USTransportation_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USTransportation_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.3 OR - Oregon
cu_emp_data_USTransportation_OR = cu_data_USTransportation %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Transportation and warehousing
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(800,801,802,803,804,805,806,807,808,809,811)) %>%
# mutate(parent_line_code = 800)
cpi_emp_data_USTransportation_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USTransportation_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.4 Medical
4.4.1 NY - New York
cu_emp_data_USMedical_NY = cu_data_USMedical %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Health care and social assistance
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1600,1601,1602,1603,1604)) %>%
# mutate(parent_line_code = 1600)
cpi_emp_data_USMedical_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USMedical_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.2 MI - Michigan
cu_emp_data_USMedical_MI = cu_data_USMedical %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Health care and social assistance
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1600,1601,1602,1603,1604)) %>%
# mutate(parent_line_code = 1600)
cpi_emp_data_USMedical_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USMedical_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.3 OR - Oregon
cu_emp_data_USMedical_OR = cu_data_USMedical %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Health care and social assistance
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1600,1601,1602,1603,1604)) %>%
# mutate(parent_line_code = 1600)
cpi_emp_data_USMedical_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USMedical_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.5 Education & Communication
4.5.1 NY - New York
cu_data_USEducationAndCommunication_NY = cu_data_USEducationAndCommunication %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE', emp_line_code = 1500) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Educational services
#EMP_DAT_CAT_SAN_USEducationAndCommunication_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1500)) %>%
# mutate(parent_line_code = 1500)
cpi_emp_data_USEducationAndCommunication_NY = merge(EMP_DAT_CAT_SAN_NY, cu_data_USEducationAndCommunication_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.2 MI - Michigan
cu_data_USEducationAndCommunication_MI = cu_data_USEducationAndCommunication %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE', emp_line_code = 1500) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Educational services
#EMP_DAT_CAT_SAN_USEducationAndCommunication_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1500)) %>%
# mutate(parent_line_code = 1500)
cpi_emp_data_USEducationAndCommunication_MI = merge(EMP_DAT_CAT_SAN_MI, cu_data_USEducationAndCommunication_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.3 OR - Oregon
cu_data_USEducationAndCommunication_OR = cu_data_USEducationAndCommunication %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE', emp_line_code = 1500) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Educational services
#EMP_DAT_CAT_SAN_USEducationAndCommunication_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1500)) %>%
# mutate(parent_line_code = 1500)
cpi_emp_data_USEducationAndCommunication_OR = merge(EMP_DAT_CAT_SAN_OR, cu_data_USEducationAndCommunication_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_OR %>%
# sample_frac(0.33) %>%
DT::datatable()